The Infinite Partially Observable Markov Decision Process
نویسنده
چکیده
The Partially Observable Markov Decision Process (POMDP) framework has proven useful in planning domains where agents must balance actions that provide knowledge and actions that provide reward. Unfortunately, most POMDPs are complex structures with a large number of parameters. In many real-world problems, both the structure and the parameters are difficult to specify from domain knowledge alone. Recent work in Bayesian reinforcement learning has made headway in learning POMDP models; however, this work has largely focused on learning the parameters of the POMDP model. We define an infinite POMDP (iPOMDP) model that does not require knowledge of the size of the state space; instead, it assumes that the number of visited states will grow as the agent explores its world and only models visited states explicitly. We demonstrate the iPOMDP on several standard problems.
منابع مشابه
A POMDP Framework to Find Optimal Inspection and Maintenance Policies via Availability and Profit Maximization for Manufacturing Systems
Maintenance can be the factor of either increasing or decreasing system's availability, so it is valuable work to evaluate a maintenance policy from cost and availability point of view, simultaneously and according to decision maker's priorities. This study proposes a Partially Observable Markov Decision Process (POMDP) framework for a partially observable and stochastically deteriorating syste...
متن کاملOn the Undecidability of Probabilistic Planning and Infinite-Horizon Partially Observable Markov Decision Problems
We investigate the computability of problems in probabilistic planning and partially observable infinite-horizon Markov decision processes. The undecidability of the string-existence problem for probabilistic finite automata is adapted to show that the following problem of plan existence in probabilistic planning is undecidable: given a probabilistic planning problem, determine whether there ex...
متن کاملOptimal Control of Infinite Horizon Partially Observable Decision Processes Modeled As Generators of Probabilistic Regular Languages
Decision processes with incomplete state feedback have been traditionally modeled as Partially Observable Markov Decision Processes. In this paper, we present an alternative formulation based on probabilistic regular languages. The proposed approach generalizes the recently reported work on language measure theoretic optimal control for perfectly observable situations and shows that such a fram...
متن کاملOptimal control of infinite horizon partially observable decision processes modelled as generators of probabilistic regular languages
Decision processes with incomplete state feedback have been traditionally modelled as partially observable Markov decision processes. In this article, we present an alternative formulation based on probabilistic regular languages. The proposed approach generalises the recently reported work on language measure theoretic optimal control for perfectly observable situations and shows that such a f...
متن کاملOptimal Control for Partially Observable Markov Decision Processes over an Infinite Horizon
In this paper we consider an optimal control problem for partially observable Markov decision processes with finite states, signals and actions OVE,r an infinite horizon. It is shown that there are €optimal piecewise·linear value functions and piecl~wise-constant policies which are simple. Simple means that there are only finitely many pieces, each of which is defined on a convex polyhedral set...
متن کاملThe Infinite Regionalized Policy Representation-0.1cm
We introduce the infinite regionalized policy presentation (iRPR), as a nonparametric policy for reinforcement learning in partially observable Markov decision processes (POMDPs). The iRPR assumes an unbounded set of decision states a priori, and infers the number of states to represent the policy given the experiences. We propose algorithms for learning the number of decision states while main...
متن کامل